NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An Efficient Linear Mixed Model Framework for Meta-Analytic Association Studies Across Multiple Contexts

Jew, Brandon; Li, Jiajin; Sankararaman, Sriram; Sul, Jae-Hoon (January 2021, 21st International Workshop on Algorithms in Bioinformatics, {WABI})

Linear mixed models (LMMs) can be applied in the meta-analyses of responses from individuals across multiple contexts, increasing power to detect associations while accounting for confounding effects arising from within-individual variation. However, traditional approaches to fitting these models can be computationally intractable. Here, we describe an efficient and exact method for fitting a multiple-context linear mixed model. Whereas existing exact methods may be cubic in their time complexity with respect to the number of individuals, our approach for multiple-context LMMs (mcLMM) is linear. These improvements allow for large-scale analyses requiring computing time and memory magnitudes of order less than existing methods. As examples, we apply our approach to identify expression quantitative trait loci from large-scale gene expression data measured across multiple tissues as well as joint analyses of multiple phenotypes in genomewide association studies at biobank scale.
more » « less
Full Text Available
Variant calling and quality control of large-scale human genome sequencing data

https://doi.org/10.1042/ETLS20190007

Pellegrini, Matteo; Jew, Brandon; Sul, Jae Hoon (July 2019, Emerging Topics in Life Sciences)

Abstract Next-generation sequencing has allowed genetic studies to collect genome sequencing data from a large number of individuals. However, raw sequencing data are not usually interpretable due to fragmentation of the genome and technical biases; therefore, analysis of these data requires many computational approaches. First, for each sequenced individual, sequencing data are aligned and further processed to account for technical biases. Then, variant calling is performed to obtain information on the positions of genetic variants and their corresponding genotypes. Quality control (QC) is applied to identify individuals and genetic variants with sequencing errors. These procedures are necessary to generate accurate variant calls from sequencing data, and many computational approaches have been developed for these tasks. This review will focus on current widely used approaches for variant calling and QC.
more » « less
Full Text Available
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

https://doi.org/10.1038/s41598-020-67513-5

Alvarez, Marcus; Rahmani, Elior; Jew, Brandon; Garske, Kristina M.; Miao, Zong; Benhammou, Jihane N.; Ye, Chun Jimmie; Pisegna, Joseph R.; Pietiläinen, Kirsi H.; Halperin, Eran; et al (December 2020, Scientific Reports)
null (Ed.)
Abstract Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem .
more » « less
Full Text Available
ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest

https://doi.org/10.1371/journal.pcbi.1007556

Li, Jiajin; Jew, Brandon; Zhan, Lingyu; Hwang, Sungoo; Coppola, Giovanni; Freimer, Nelson B.; Sul, Jae Hoon; Pertea, Mihaela (December 2019, PLOS Computational Biology)

Full Text Available
Leveraging allelic imbalance to refine fine-mapping for eQTL studies

https://doi.org/10.1371/journal.pgen.1008481

Zou, Jennifer; Hormozdiari, Farhad; Jew, Brandon; Castel, Stephane E.; Lappalainen, Tuuli; Ernst, Jason; Sul, Jae Hoon; Eskin, Eleazar; Wen, Xiaoquan (December 2019, PLOS Genetics)

Full Text Available
The causal effect of obesity on prediabetes and insulin resistance reveals the important role of adipose tissue in insulin resistance

https://doi.org/10.1371/journal.pgen.1009018

Miao, Zong; Alvarez, Marcus; Ko, Arthur; Bhagat, Yash; Rahmani, Elior; Jew, Brandon; Heinonen, Sini; Muñoz-Hernandez, Linda Liliana; Herrera-Hernandez, Miguel; Aguilar-Salinas, Carlos; et al (September 2020, PLOS Genetics)
Hauser, Elizabeth R. (Ed.)
Full Text Available
Accurate estimation of cell composition in bulk expression through robust integration of single-cell information

https://doi.org/10.1038/s41467-020-15816-6

Jew, Brandon; Alvarez, Marcus; Rahmani, Elior; Miao, Zong; Ko, Arthur; Garske, Kristina M.; Sul, Jae Hoon; Pietiläinen, Kirsi H.; Pajukanta, Päivi; Halperin, Eran (April 2020, Nature Communications)

Abstract We present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) or single-nucleus RNA-seq (snRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and snRNA-seq data, Bisque replicates previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. We further propose an additional mode of operation that merely requires a set of known marker genes.
more » « less
A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity

https://doi.org/10.1371/journal.pone.0239474

Goodman-Meza, David; Rudas, Akos; Chiang, Jeffrey N.; Adamson, Paul C.; Ebinger, Joseph; Sun, Nancy; Botting, Patrick; Fulcher, Jennifer A.; Saab, Faysal G.; Brook, Rachel; et al (September 2020, PLOS ONE)
Urbanowicz, Ryan J. (Ed.)
Full Text Available

Search for: All records